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What is claimed is: 

1 1 . A method for multiple store buffer forwarding in a system with a restrictive 

2 memory model, the method comprising: 

3 executing a plurality of store instructions; 

4 executing a load instruction; 

5 determining that a memory region addressed by the load instruction matches a cacheline 

6 address in a memory; 

7 determining that data stored by the plurality of store instructions completely covers the 

8 memory region addressed by the load instruction; and 

9 transmitting a store forward is OK signal. 

1 2. The method of claim 1, wherein executing the plurality of store instructions 

2 comprises: 

3 performing a plurality of store operations to store a plurality of data values in contiguous 

4 memory locations in the memory, wherein the size of the contiguous memory locations equals 

5 the size of the memory region addressed by the load instruction. 

1 3. The method of claim 2, wherein executing a load instruction comprises: 

2 loading the data from the contiguous memory locations in the memory; and 

3 generating the store forward is OK signal. 
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1 4. The method of claim 3, wherein loading the data from the contiguous memory 

2 locations in the memory begins after performing the plurality of store operations begins, and 

3 loading the data from the contiguous memory locations in the memory completes before the 

4 plurality of store operations become globally observed in the system. 

1 5. The method of claim 1, wherein executing the load instruction comprises : 

2 loading the data from a write combining buffer; and 

3 generating the store forward is OK signal. 

1 6. The method of claim 1, wherein determining that a memory region addressed by 

2 the load instruction matches a cacheline address in a memory comprises: 

3 comparing an address of the memory region and the cacheline address; and 

4 determining that the address of the memory region is the same address as the cacheline 

5 address. 
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1 7. The method of claim 1 , wherein determining that the data stored by the plurality 

2 of store instructions completely covers the memory region addressed by the load instruction 

3 comprises: 

4 determining that a size of the data stored by the plurality of store instructions equals a 

5 size of the memory region addressed by the load instruction. 

1 8. The method of claim 1, further comprising: 

2 terminating, if an address of the memory region and the cacheline address in the memory 

3 are different. 

1 9. The method of claim 1, further comprising: 

2 re-executing the load instruction, if the memory region is incompletely covered by the 

3 data stored by the plurality of store instructions. 

1 10. The method of claim 1, wherein intermediate results from the plurality of store 

2 instructions are invisible to other concurrent processes. 

1 11. The method of claim 1, wherein the method operates within the restrictive 

2 memory model. 
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1 1 2. A machine-readable medium having stored thereon a plurality of executable 

2 instructions for multiple store buffer forwarding in a system with a restrictive memory model, the 

3 plurality of instructions comprising instructions to: 

4 execute a plurality of store instructions; 

5 execute a load instruction; 

6 determine that a memory region addressed by the load instruction matches a cacheline 

7 address in a memory; 

8 determine that data stored by the plurality of store instructions completely covers the 

9 memory location in the memory specified by the load instruction; and 
10 transmit a store forward is OK signal. 

1 13. The machine-readable medium of claim 12, wherein the execute the plurality of 

2 store instructions instruction comprises an instruction to: 

3 perform a plurality of store operations to store a plurality of data values in contiguous 

4 memory locations in the memory, wherein the size of the contiguous memory locations equals 

5 the size of a the memory region addressed the load instruction. 

1 14. The machine-readable medium of claim 13, wherein the execute a load instruction 

2 instruction comprises instructions to: 

3 load the data from the contiguous memory locations in the memory; and 

4 generate the store forward is OK signal. 
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1 15. The machine-readable medium of claim 14, wherein the load data from the 

2 contiguous memory locations in the memory instruction begins executing after the perform the 

3 plurality of store operations instruction begins executing, and the load data from the contiguous 

4 memory locations in the memory instruction completes executing before the plurality of store 

5 operations become globally observed in the system. 

1 16. The machine-readable medium of claim 12, wherein the execute the load 

2 instruction comprises instructions to: 

3 load the data from a write combining buffer; and 

4 generate the store forward is OK signal. 

1 17. The machine-readable medium of claim 12, wherein the determine that a memory 

2 region addressed by the load instruction matches a cacheline address in a memory instruction 

3 comprises instructions to: 

4 compare the address of the memory region and the cacheline address; and 

5 determine that the address of the memory region is the same address as the cacheline 

6 address. 
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1 18. The machine-readable medium of claim 1 2, wherein the determine that data stored 

2 by the plurality of store instructions completely covers the memory location in the memory 

3 specified by the load instruction instruction comprises an instruction to: 

4 determine that a size of the data stored by the plurality of store instructions equals a size 

5 of the memory region addressed by the load instruction. 

1 19. The machine-readable medium of claim 1 2, further comprising an instruction to: 

2 terminate, if an address of the memory region and the cacheline address in the memory 

3 are different. 

1 20. The machine-readable medium of claim 12, further comprising an instruction to: 

2 re-execute the load instruction, if the memory region is incompletely covered by the data 

3 stored by the plurality of store instructions. 

1 21. The machine-readable medium of claim 1 2, wherein the execute the plurality of 

2 store instructions instruction comprises an instruction to: 

3 execute the plurality of store instructions to produce intermediate results that are invisible 

4 to other concurrent processes. 

1 22. The machine-readable medium of claim 12, wherein the plurality of executable 

2 instructions operate within the restrictive memory model. 
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1 23. A processor system, comprising: 

2 a processor; 

3 a system memory coupled to the processor; and 

4 a non-volatile memory coupled to the processor in which is stored an article of 

5 manufacture including instructions adapted to be executed by the processor, the instructions 

6 which, when executed, encode instructions in an instruction set to enable multiple store buffer 

7 forwarding in a system with a restrictive memory model, the article of manufacture comprising 

8 instructions to: 

9 execute a plurality of store instructions; 

1 0 execute a load instruction; 

1 1 determine that a memory region addressed by the load instruction matches a cacheline 

1 2 address in a memory; 

13 determine that data stored by the plurality of store instructions completely covers the 

14 memory location in the memory specified by the load instruction; and 

1 5 transmit a store forward is OK signal. 

i 

1 24. The processor system of claim 23, the processor comprising: 

2 a write combining buffer, the write combining buffer including: 

3 a comparator, the comparator being configured to receive and compare an 

4 incoming load operation target address with all cacheline addresses of existing write combining 
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5 buffer entries; 

6 an address and data buffer coupled to the comparator; 

7 a data valid bits buffer coupled to the address and data buffer; 

8 a multiplexer coupled to the data valid bits buffer; and 

9 a comparison circuit coupled to the multiplexer. 

1 25. The processor system of claim 24, the multiplexer being configured to: 

2 receive a byte valid vector from the data valid bits buffer; 

3 receive address bits from the load operation and output valid bits; 

4 select a group of valid bits from the byte valid vector; and 

5 output the group of valid bits. 

1 26. The processor system of claim 24, the comparison circuit being configured to: 

2 receive the group of valid bits; 

3 receive an incoming load operation byte mask; 

4 determine that it is acceptable to forward the data using the group of valid bits and the 

5 incoming load operation byte mask; and 

6 produce a forward OK signal. 

1 27. The article of manufacture of claim 23, wherein the execute the plurality of store 

2 instructions instruction comprises an instruction to: 
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3 perform a plurality of store operations to store a plurality of data values in contiguous 

4 memory locations in the memory, wherein the size of the contiguous memory locations equals 

5 the size of a the memory region addressed the load instruction. 

1 28. The machine-readable medium of claim 27, wherein the execute a load instruction 

2 instruction comprises instructions to: 

3 load the data from the contiguous memory locations in the memory; and 

4 generate the store forward is OK signal. 

1 29. The machine-readable medium of claim 28, wherein the load data from the 

2 contiguous memory locations in the memory instruction begins executing after the perform the 

3 plurality of store operations instruction begins executing, and the load data from the contiguous 

4 memory locations in the memory instruction completes executing before the plurality of store 

5 operations become globally observed in the system. 

1 30. The processor system of claim 23, wherein said processor is implemented as a 

2 multi-processor having associated with each said multi-processor a separate set of hardware 

3 resources. 



24 



